694. Number of Distinct Islands
Medium

You are given an m x n binary matrix grid. An island is a group of 1's (representing land) connected 4-directionally (horizontal or vertical.) You may assume all four edges of the grid are surrounded by water.

An island is considered to be the same as another if and only if one island can be translated (and not rotated or reflected) to equal the other.

Return the number of distinct islands.

 

Example 1:

Input: grid = [[1,1,0,0,0],[1,1,0,0,0],[0,0,0,1,1],[0,0,0,1,1]]
Output: 1

Example 2:

Input: grid = [[1,1,0,1,1],[1,0,0,0,0],[0,0,0,0,1],[1,1,0,1,1]]
Output: 3

 

Constraints:

  • m == grid.length
  • n == grid[i].length
  • 1 <= m, n <= 50
  • grid[i][j] is either 0 or 1.
Accepted
93,568
Submissions
160,237
Seen this question in a real interview before?
Companies

Average Rating: 4.27 (90 votes)

Premium

Video Solution


Solution Article


Overview

There are two parts to this problem.

  1. Finding every island. We can do this with a straightforward depth-first search (DFS). Check out our relevant Explore Card if you are not already familiar with DFS.
  2. Determining whether or not two islands are the same. This part is more difficult, and is the focus of this article.


Approach 1: Brute Force

Intuition

Assume we've already used a DFS to make a list of islands, where each island is represented as a list of coordinates. We now need to determine how many of these islands actually have a unique shape.

Since two islands are the same if one can be translated to the other, we can translate each island so that it is as pushed as far into the top left as possible. By doing this, two islands that are the same shape, but in different locations, will now have identical coordinates. For example, if an island is made from cells [(2, 1), (3, 1), (1, 2), (2, 2)], it would become [(1, 0), (2, 0), (0, 1), (1,1)] when anchored at the top-left corner.

The 4-cell island described above is moved as far up and left as possible.

Similarly, the island made from cells [(2, 0), (3, 0), (1, 1), (2, 1)] will also become [(1, 0), (2, 0), (0, 1), (1,1)].

The 4-cell island described above is moved as far up and left as possible.

Algorithm

  1. Use DFS to make a list of islands, where each island is a list of coordinates.
  2. Initialize a count of the number of unique islands to 0.
  3. For each island, compare it cell-by-cell to every other island. If it is found to be unique, increment count by 1.
  4. Return the value of count.

Order doesn't matter, so the two islands [(0, 1), (0, 2)] and [(0, 2), (0, 1)] should be considered as identical. However, we can avoid the need to worry about order by ensuring that two islands of the same shape are initially discovered from the same relative cell. Then from there, the DFS will always visit the cells in the same relative order. This is easy to do: we can simply search for islands by iterating left to right, top to bottom. This way, each island will always be discovered from the leftmost cell in its top row. The diagram below shows the first cell discovered for each island, using this traversal order. Notice that islands of the same shape are first discovered from the same relative cell.

Various islands, and their first cell that will be discovered is marked.

We can also make one other clever observation: we can simply translate each island so that the first cell of the island that is discovered is on (0, 0). If, for example, an island contains the cells [(2, 3), (2, 4), (2, 5), (3, 5)], we subtract (2, 3) off each cell to get [(0, 0), (0, 1), (0, 2), (1, 2)].

Complexity Analysis

  • Time Complexity: O(M2N2)O(M^2 \cdot N^2).

    In the worst case, we would have a large grid, with many unique islands all of the same size, and the islands packed as closely together as possible. This would mean that for each island we discover, we'd be looping over the cells of all the other islands we've discovered so far. For example, here's a grid with M=10M = 10, N=10N = 10, and islands all of size 5.

    A 10-by-10 grid with 12 pentominoes (islands with 5 cells) fitted into it.

    A detailed analysis of this approach is well beyond the scope of LeetCode and job interviews. So the remainder of this discussion is an informal introduction to the key ideas that you would need to investigate in order to prove the time complexity, and is provded only for interest purposes.

    Shapes that are made by connecting square shaped cells together are called polyominos. Polynominos are grouped into sets of particular sizes. For example, the set of all polyominos made of 5 square cells (like the ones in the example just above) are called pentominoes.

    Worst cases use the smallest possible polyomino size that allows tiling at least half of the grid with all unique islands. This maximizes the number of times that we have to iterate over all cells of all islands.

    So, how many polyominos are there of each size? The answer to this is given by OEIS:A001168. As you can see, this is a very fast growing sequence - there are 3644636446 polyominos of size 1010! Just by looking at this, you can hopefully see that as the size of the grid we want to put islands in grows, the size of the islands we'll need to use grows very slowly! This relative small size maximizes the amount of repeated iteration we'll need to do over the islad list.

  • Space complexity: O(NM)O(N \cdot M).

    The seen set requires O(NM)O(N \cdot M) memory. Additionally, each cell with land requires O(1)O(1) space in the islands array.



Approach 2: Hash By Local Coordinates

Intuition

The previous approach is inefficient because the operation for determining whether or not an island is unique required looping through every coordinate of every island discovered so far.

Instead of comparing islands by looping over coordinates, we could simply calculate a hash for each island in such a way that ensured two identical islands have the same hash value. These hashes could then be put into a hash set. As sets don't store duplicates, the number of islands in the hash set once we're done would be equal to the number of unique islands.

Words of Advice: Confused by this approach? If you don't have much experience with hashing algorithms, it's unlikely that you would have been able to come up with this approach on your own. As practice makes perfect, we recommend checking out our Explore Card on Hash Tables once you're done with this problem. Hashing is a very powerful technique for reducing time complexities, and is an important part of designing complicated real world algorithms in many areas, such as artificial intelligence.

Implementation

The best way of implementing this is language-dependent.

In Java, we can actually represent each island as a HashSet of Pairs, with one Pair for each cell. We can then put all of the islands into another HashSet, and this will hash the HashSets that represent the islands.

In Python, there is a data structure called a frozenset that we have to use instead, as unlike Java, Python doesn't allow inserting a set into another set. A frozenset is an immutable set.

Complexity Analysis

Let MM be the number of rows, and NN be the number of columns.

  • Time Complexity : O(MN)O(M \cdot N).

    We're inserting each cell into a hash table (corresponding to the island it is a part of), and then inserting each of those hash tables into another hash table.

    The cost of inserting each of the MNM \cdot N cells into its initial hash table is O(1)O(1), so those insertions have a total complexity of O(MN)O(M \cdot N).

    To insert the "island" hash tables into the final hash table, each of them has to (within the library code) be hashed by a hash function. While often we assume that the process of hashing is O(1)O(1), in this we can't as the inputs to be hashed could be arbitrarily large. So instead, the cost of hashing them is linearly proportional to the number of cells in the hash table being hashed. Each cell is essentially getting hashed once in this process too though (as each can only be a part of one island), and so this part is also O(MN)O(M \cdot N).

    As both phases have a time complexity of O(MN)O(M \cdot N), this is the total time complexity.

    Be aware that the time complexity of this approach is dependent on a good hash function. The built-in hash functions for Java and Python seem okay, but I recommend being very careful. A poor hash function would lead to performance comparable to Approach 1.

  • Space complexity : O(MN)O(M \cdot N). The seen set is the biggest use of additional memory.



Approach 3: Hash By Path Signature

Intuition

Remember how we didn't need to sort islands in Approach 1? When we start a depth-first search on the top-left square of some island, the path taken by our depth-first search will be the same if, and only if, the shape is the same. We can exploit this by using the path as a hash.

Algorithm

Each time we discover the first cell in a new island, we initialize an empty string. Then each time dfs is called for that island, we firstly determine whether or not the cell being entered is un-visited land, in the same way as before. If it is, then we append the direction we entered it from to the string. For example, this is the path that our algorithm would follow to explore the following island.

The path taken to explore a large island.

This path will be encoded as "RDDRUURRUL".

There's one thing we need to be cautious of. The three islands below would have the same encoding of RDDDR.

3 islands that hash to the same path, using the scheme described above.

The solution to this is that we also need to record where we backtracked. This occurs each time we exit a call to the dfs(...) function. We'll do this by appending a 0 to the string.

The same 3 islands as above, also showing the back movements.

In this way, the islands will now have the encodings of RDDDR, RDDD0R, and RDDD00R.

Complexity Analysis

Let MM be the number of rows, and NN be the number of columns.

  • Time Complexity : O(MN)O(M \cdot N). Same as Approach 2.

  • Space complexity : O(MN)O(M \cdot N). Same as Approach 2.

Report Article Issue

Comments: 47
derreck503's avatar
Read More

For solution 1: Instead of doing shape.add((r - r0) * 2 * grid[0].length + (c - c0)); I store the coordinates as a string: String temp = (r - r0) + "," + (c - c0);. It's easier to read/understand IMO.

Here's my full solution:

class Solution {
    public int numDistinctIslands(int[][] grid) {
        if (grid == null || grid.length == 0) return 0;
        
        // Sets only keep unique data
        Set<Set<String>> set = new HashSet<>();
        
        for(int i = 0; i < grid.length; i++) {
            for (int j = 0; j < grid[i].length; j++) {
                if (grid[i][j] == 1) {
                    // Second set of i, j will be used to translate coordinates to top left of grid
                    Set<String> results = dfs(grid, i, j, i, j, new HashSet<>());
                    
                    // Add results to set
                    set.add(results);
                }
            }
        }
        
        // Will only contain coordinates of unique islands
        return set.size();
    }
    
    
    public Set<String> dfs(int[][] grid, int i, int j, int row, int col, Set<String> found) {
        if (i >= 0 && i < grid.length && j >= 0 && j < grid[i].length && grid[i][j] == 1) {
            
            // Mark as visited
            grid[i][j] = 0; 
            
            // Translate position to top left of grid and save coordinates as a string. StringBuilder would be faster btw.
            String temp = (i - row) + "," + (j-col);
            
            // Add to found set
            found.add(temp);
            
            // Find left nodes
            dfs(grid, i - 1, j, row, col, found);
            // Find right nodes
            dfs(grid, i + 1, j, row, col, found);
            // Find bottom nodes
            dfs(grid, i, j - 1, row, col, found);
            // Find top nodes
            dfs(grid, i, j + 1, row, col, found); 

        }
        
        return found;
    }
}

55
Show 5 replies
Reply
Share
Report
XinyuHou's avatar
Read More

Firstly, thanks for the answer.

From there, we only need to check how many unique shapes there ignoring permutations (so [(0, 0), (0, 1)] is the same as [(0, 1), (1, 0)]).

[(0, 0), (0, 1)] and [(0, 1), (1, 0)]), they are different.

We multiplied the number of rows by 2 * grid[0].length instead of grid[0].length because our local row-coordinate could be negative.

How could local row-coordinate be negative?

You iterate through the matrix row by row (from top to bottom), within rows, it's from left to right. so in (r - r0) * 2 * grid[0].length + (c - c0), r- r0 should always be >= 0. while c - c0 is in [1 - grid[0].length, grid[0].length - 1]

29
Show 6 replies
Reply
Share
Report
MockCode's avatar
Read More

Why do you have "shape.add(0);" in the function explore of the second approach?

27
Show 3 replies
Reply
Share
Report
vishym77's avatar
Read More

This problem is hard.

23
Reply
Share
Report
awice's avatar
Read More

@jwgoh Corrected, thanks.

@kk636 (shape.add(0))
Say we have shapes like

A = [[1,0],
     [1,1]]
B = [[1,1],
     [1,0]]

Without shape.add(0), which records when we exit the function, these two shapes have the same path signature of [0, 1, 3]. This is because we don't know whether the east ("3") direction marked in the shape means east of the top left corner, or east of the bottom left corner.

When we record exiting the function, they will have the signature [0, 1, 3, 0, 0, 0] and [0, 1, 0, 3, 0, 0] respectively. The 0 basically functioned as an "escape" or "backwards" move when describing the path - it says take the cursor that you have, and go back to the square you were on. We could interpret these path signatures as [SOUTH, EAST, ESCAPE, ESCAPE, ESCAPE], and [SOUTH, ESCAPE, EAST, ESCAPE, ESCAPE] if we wanted to reconstruct the path.

@kk636 ((r - r0, c - c0) tuples)

We need to convert these tuples to an integer with any injective function (injective just means two different tuples don't convert to the same integer.)

Typically, the function that people might choose is (r, c) -> (r * num_columns + c). This is because they can take that result V and inspect V // num_columns, V % num_columns to reconstruct the answer. Instead of choosing the constant num_columns, we could have chosen any bigger number as well.

Here, because we are dealing with local coordinates, c >= 0 is not guaranteed: it could be anywhere inside the interval [-num_columns + 1, num_columns - 1]. However, r >= 0 is guaranteed because of how we found the shape (we looked for land from top to bottom, left to right.)

In the end, we chose the function (r, c) -> (r * (2 * num_columns) + c). No two local coordinates will have the same result, which is all we needed.

28
Show 5 replies
Reply
Share
Report
MockCode's avatar
Read More

Can you explain how you converted our tuples (r - r0, c - c0) to integers in the Java solution? I don't quite understand.

8
Reply
Share
Report
jwgoh's avatar
Read More

For example, if an island is made from squares [(2, 3), (2, 4), (3, 4)], we can think of this shape as [(0, 0), (0, 1), (1, 0)] when anchored at the top-left corner.

Should be [(0, 0), (0, 1), (1, 1)]

11
Reply
Share
Report
srishti7's avatar
Read More

We can optimize the second solution and reduce space by just changing the visited indices from to 0 from 1 in the explore method.
This way we do not need to keep track of the seen matrix.

5
Reply
Share
Report
vegito2002's avatar
Read More

Can we rely on the hash of a HashSet being unique/commensurate in terms of all its contents?

I was so obsessed with calculating a hash for shape in my own way, and did't realize I could have just used shape itself all the time.

3
Show 2 replies
Reply
Share
Report
jishenli1227's avatar
Read More

Thanks for the solution, but I'm confused about why we don't allow negative values in the set?

2
Show 1 reply
Reply
Share
Report

You don't have any submissions yet.

694/1883
Contribute
|||
Saved
#651 4 Keys Keyboard
Medium
#652 Find Duplicate Subtrees
Medium
#653 Two Sum IV - Input is a BST
Easy
#654 Maximum Binary Tree
Medium
#655 Print Binary Tree
Medium
#656 Coin Path
Hard
#657 Robot Return to Origin
Easy
#658 Find K Closest Elements
Medium
#659 Split Array into Consecutive Subsequences
Medium
#660 Remove 9
Hard
#661 Image Smoother
Easy
#662 Maximum Width of Binary Tree
Medium
#663 Equal Tree Partition
Medium
#664 Strange Printer
Hard
#665 Non-decreasing Array
Medium
#666 Path Sum IV
Medium
#667 Beautiful Arrangement II
Medium
#668 Kth Smallest Number in Multiplication Table
Hard
#669 Trim a Binary Search Tree
Medium
#670 Maximum Swap
Medium
#671 Second Minimum Node In a Binary Tree
Easy
#672 Bulb Switcher II
Medium
#673 Number of Longest Increasing Subsequence
Medium
#674 Longest Continuous Increasing Subsequence
Easy
#675 Cut Off Trees for Golf Event
Hard
#676 Implement Magic Dictionary
Medium
#677 Map Sum Pairs
Medium
#678 Valid Parenthesis String
Medium
#679 24 Game
Hard
#680 Valid Palindrome II
Easy
#681 Next Closest Time
Medium
#682 Baseball Game
Easy
#683 K Empty Slots
Hard
#684 Redundant Connection
Medium
#685 Redundant Connection II
Hard
#686 Repeated String Match
Medium
#687 Longest Univalue Path
Medium
#688 Knight Probability in Chessboard
Medium
#689 Maximum Sum of 3 Non-Overlapping Subarrays
Hard
#690 Employee Importance
Easy
#691 Stickers to Spell Word
Hard
#692 Top K Frequent Words
Medium
#693 Binary Number with Alternating Bits
Easy
#694 Number of Distinct Islands
Medium
#695 Max Area of Island
Medium
#696 Count Binary Substrings
Easy
#697 Degree of an Array
Easy
#698 Partition to K Equal Sum Subsets
Medium
#699 Falling Squares
Hard
#700 Search in a Binary Search Tree
Easy